{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "prospective-pastor",
   "metadata": {},
   "source": [
    "# Python 预缓存的 Property 修饰器简单实现"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "rational-shield",
   "metadata": {},
   "source": [
    "> 创建时间：2021-03-19"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "frank-floor",
   "metadata": {},
   "source": [
    "这份简短笔记，我们会讨论预缓存 Python Property (类属性) 的简单实现。\n",
    "\n",
    "在运行程序时，特别是对于耗时但不耗内存、以后需要经常取用的计算，我们会希望找个内存或硬盘空间储存起来。为了方便起见，我们只讨论借用内存的方法。\n",
    "\n",
    "假设现在的问题是，我们要计算 $c = a+b, d = a^2$。为了调用的便利，$a, b$ 两个变量 (作为常数) 作为 property。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "alpine-trade",
   "metadata": {},
   "source": [
    "但麻烦之处在于，$a, b$ 的值并不容易求，求完之后对内存的消耗却又不大。(嘛先不要追问为什么求个 1+1 要这么复杂)\n",
    "\n",
    "$$\n",
    "\\begin{align}\n",
    "a &= \\sum_{n = 1}^{\\infty} \\frac{1}{2^n} \\simeq \\sum_{n = 1}^{1000} \\frac{1}{2^n} \\\\\n",
    "b &= \\int_0^1 3 x^2 \\, \\mathrm{d} x \\simeq \\sum_{n = 0}^{5000} \\frac{3 n^2}{5000^3}\n",
    "\\end{align}\n",
    "$$\n",
    "\n",
    "同时，我们也不清楚末端用户是否需要求 $c$ (需要同时计算 $a$, $b$)，还是需要求 $d$ (只需要计算 $a$ 即可)。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "collectible-encyclopedia",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_a():\n",
    "    a = 0\n",
    "    for n in range(1, 1001):\n",
    "        a += 1 / 2**n\n",
    "    return a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ready-musician",
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_b():\n",
    "    b = 0\n",
    "    for n in range(5001):\n",
    "        b += 3 * n**2 / 5000**3\n",
    "    return b"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "federal-certificate",
   "metadata": {},
   "source": [
    "这篇文档讨论四种做法。第一种做法简单但低效；第二、三种做法代码较复杂；第四种代码简单且不会产生多余的计算。作者倾向使用 [第三种](#偷懒的做法：将赋值函数嵌入-getter-函数) 与 [第四种](#改进的做法：缩减隐含变量的声明) 做法。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "casual-lodge",
   "metadata": {},
   "source": [
    "## 即时调用 property 定义方法"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "viral-closer",
   "metadata": {},
   "source": [
    "最简单粗暴的方法是需要 $a, b$ 时就现场计算。在第一次调用 $a, b$ 时固然需要耗时的计算，但第二次调用仍然会相当费时。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "sensitive-collapse",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Dummy:\n",
    "    \n",
    "    @property\n",
    "    def a(self):\n",
    "        return get_a()\n",
    "    \n",
    "    @property\n",
    "    def b(self):\n",
    "        return get_b()\n",
    "    \n",
    "    @property\n",
    "    def c(self):\n",
    "        return self.a + self.b\n",
    "    \n",
    "    @property\n",
    "    def d(self):\n",
    "        return self.a**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "aware-empty",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2.00030002"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dum = Dummy()\n",
    "dum.c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "pointed-partnership",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2.41 ms ± 172 µs per loop (mean ± std. dev. of 7 runs, 50 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 50\n",
    "dum.c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "collective-device",
   "metadata": {},
   "source": [
    "## 一般的 property 做法"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "superior-radar",
   "metadata": {},
   "source": [
    "为了避免多余的计算，一般的方法是，需要首先在 `__init__` 中声明两个隐含变量 `_a`, `_b` 以保存结果。在使用 `a`, `b` 两个 property 之前，先要使用 setter 函数作 $a, b$ 的计算并分别保存到 `_a`, `_b` 中；随后再用 getter 函数调用它们。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "civilian-graduate",
   "metadata": {},
   "outputs": [],
   "source": [
    "class General:\n",
    "    \n",
    "    def __init__(self):\n",
    "        self._a = NotImplemented\n",
    "        self._b = NotImplemented\n",
    "    \n",
    "    @property\n",
    "    def a(self):\n",
    "        return self._a\n",
    "    \n",
    "    @a.setter\n",
    "    def a(self, val):\n",
    "        self._a = val\n",
    "    \n",
    "    @property\n",
    "    def b(self):\n",
    "        return self._a\n",
    "    \n",
    "    @b.setter\n",
    "    def b(self, val):\n",
    "        self._b = val\n",
    "    \n",
    "    @property\n",
    "    def c(self):\n",
    "        return self.a + self.b\n",
    "    \n",
    "    @property\n",
    "    def d(self):\n",
    "        return self.a**2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "subjective-driving",
   "metadata": {},
   "source": [
    "如果没有预先使用 setter 函数，就会碰到下面这种尴尬的情况："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "incorporated-alaska",
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for +: 'NotImplementedType' and 'NotImplementedType'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m-------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-7-ed9f42ab8466>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0mgen\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mGeneral\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mgen\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mc\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m<ipython-input-6-7edbb0413f0d>\u001b[0m in \u001b[0;36mc\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m     23\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0mproperty\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     24\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 25\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0ma\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     26\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     27\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0mproperty\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for +: 'NotImplementedType' and 'NotImplementedType'"
     ]
    }
   ],
   "source": [
    "gen = General()\n",
    "gen.c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "special-girlfriend",
   "metadata": {},
   "source": [
    "因此，正确的调用方式是"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "offensive-permission",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2.0, 1.0)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gen = General()\n",
    "gen.a, gen.b = get_a(), get_b()\n",
    "gen.c, gen.d"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "arabic-portrait",
   "metadata": {},
   "source": [
    "上面的步骤是耗时的，但随后当要调用 `a`, `b` 变量时，就会快捷很多："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "given-tracy",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "265 ns ± 28 ns per loop (mean ± std. dev. of 7 runs, 50 loops each)\n"
     ]
    }
   ],
   "source": [
    "%%timeit -n 50\n",
    "gen.c"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ordered-johns",
   "metadata": {},
   "source": [
    "但这里有一个问题：$a, b$ 的值实际上可以看作常数；如果末端用户真正希望得到的是 $d = a^2$ 而非 $c = a + b$，那么实际上用户不需要 $b$，自然也就不需要对其花时间赋值了。决定是否要对 $b$ 赋值的任务由此交给末端用户，这会造成一些困扰。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "modular-cookbook",
   "metadata": {},
   "source": [
    "## 偷懒的做法：将赋值函数嵌入 getter 函数"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ecological-perfume",
   "metadata": {},
   "source": [
    "如果这个任务交给程序编写者，那么一种最简单的实现方式是把赋值函数嵌入到 getter 函数中：\n",
    "\n",
    "- 如果 $a$ 的值已经被计算过，那么就从缓存空间 `_a` 取出该值；\n",
    "\n",
    "- 如果 $a$ 被调用前没有被计算过，那么就计算该值并放入缓存 `_a`。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "rotary-update",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Improved:\n",
    "    \n",
    "    def __init__(self):\n",
    "        self._a = NotImplemented\n",
    "        self._b = NotImplemented\n",
    "    \n",
    "    @property\n",
    "    def a(self):\n",
    "        if self._a is NotImplemented:\n",
    "            self._a = get_a()\n",
    "        return self._a\n",
    "    \n",
    "    @property\n",
    "    def b(self):\n",
    "        if self._b is NotImplemented:\n",
    "            self._b = get_b()\n",
    "        return self._b\n",
    "    \n",
    "    @property\n",
    "    def c(self):\n",
    "        return self.a + self.b\n",
    "    \n",
    "    @property\n",
    "    def d(self):\n",
    "        return self.a**2"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acknowledged-purple",
   "metadata": {},
   "source": [
    "如果末端用户只需要求 $d = a^2$，那么耗费时间的关键步就只有 $a$ 的计算；缓存空间 `_b` 就是空的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "anticipated-trick",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0\n",
      "NotImplemented\n"
     ]
    }
   ],
   "source": [
    "imp = Improved()\n",
    "print(imp.d)\n",
    "print(imp._b)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "above-ladder",
   "metadata": {},
   "source": [
    "同时，以后再需要调用 $d$ 时，$a$ 的值也不会再被计算第二次。\n",
    "\n",
    "当然，这种做法的弊端是，用户原则上无权限更改 $a, b$ 的值 (通过更改隐含变量 `_a`, `_b` 是可能的，但这违背了 PEP8 的程序规范)。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "adverse-bones",
   "metadata": {},
   "source": [
    "## 改进的做法：缩减隐含变量的声明"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "correct-samuel",
   "metadata": {},
   "source": [
    "但上面的定义仍然有很多冗余。对于每个 property，我们总要声明隐含变量、调用时判断是否缓存空间存在。这两步可以通过改编在 property 修饰器内部增加一段代码方便地实现。这个修饰器我们命名为 `cached_property`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "distinct-study",
   "metadata": {},
   "outputs": [],
   "source": [
    "def cached_property(f):\n",
    "    def wrap(*args, **kwargs):\n",
    "        self = args[0]                                                    # self\n",
    "        _f = \"_\" + f.__name__                                             # _a\n",
    "        if not hasattr(self, _f) or getattr(self, _f) is NotImplemented:  # if self._a is NotImplemented:\n",
    "            setattr(self, _f, f(*args))                                   # self._a = get_a()\n",
    "        return getattr(self, _f)                                          # return self._a\n",
    "    return property(wrap)                                                 # make this wrap a property"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "painful-mayor",
   "metadata": {},
   "source": [
    "这样之后，不仅代码量减少很多 (调用方式与最简单的 `Dummy` 完全一致)，同时也避免多余重复的计算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "opponent-lover",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Advanced:\n",
    "    \n",
    "    @cached_property\n",
    "    def a(self):\n",
    "        return get_a()\n",
    "    \n",
    "    @cached_property\n",
    "    def b(self):\n",
    "        return get_b()\n",
    "    \n",
    "    @property\n",
    "    def c(self):\n",
    "        return self.a + self.b\n",
    "    \n",
    "    @property\n",
    "    def d(self):\n",
    "        return self.a**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "prime-bunch",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.0\n",
      "False\n"
     ]
    }
   ],
   "source": [
    "adv = Advanced()\n",
    "print(adv.d)\n",
    "print(hasattr(adv, \"_b\"))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
